Spoken Language Identification using Frame Based Entropy Measures
نویسندگان
چکیده
This paper presents a real-time method for Spoken Language Identification based on the entropy of the posterior probabilities of language specific phoneme recognisers. Entropy based discriminant functions computed on short speech segments are used to compare the model fit to a specific set of observations and language identification is performed as a model selection task. The experiments, performed on a closed set of four Germanic languages on the SpeechDat telephone speech recordings, give 95% accuracy of the method for 10 seconds long speech utterances and 99% accuracy for 20 seconds long utterances.
منابع مشابه
A Maximum Entropy Approach for Spoken Chinese Understanding
In this paper, we present a spoken language understanding method based on the maximum entropy model. We first extract certain features from the corpus, and then train the maximum entropy model with an annotated corpus. We use this model to analyze spoken Chinese into semantic frames. Experiments show that the model can work effectively.
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملLanguage Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs
This paper aims to enhance spoken language identification methods based on direct discriminative modeling of language labels using deep neural networks (DNNs) and long shortterm memory recurrent neural networks (LSTM-RNNs). In conventional methods, frame-by-frame DNNs or LSTM-RNNs are used for utterance-level classification. Although they have strong frame-level classification performance and r...
متن کاملLearning Bayesian Networks for Semantic Frame Composition in a Spoken Dialog System
A stochastic approach based on Dynamic Bayesian Networks (DBNs) is introduced for spoken language understanding. DBN-based models allow to infer and then to compose semantic frame-based tree structures from speech transcriptions. Experimental results on the French MEDIA dialog corpus show the appropriateness of the technique which both lead to good tree identification results and can provide th...
متن کاملSpoken Language Identification Using LSTM-Based Angular Proximity
This paper describes the design of an acoustic language identification (LID) system based on LSTMs that directly maps a sequence of acoustic features to a vector in a vector space where the angular proximity corresponds to a measure of language/dialect similarity. A specific architecture for the LSTMbased language vector extractor is introduced along with the angular proximity loss function to ...
متن کامل